climate <- read.csv ("climate_spending.csv", header = TRUE)
library(ggplot2)
summary(climate)
## department year gcc_spending
## Agriculture :18 Min. :2000 Min. :3.113e+07
## All Other :18 1st Qu.:2004 1st Qu.:7.604e+07
## Commerce (NOAA):18 Median :2008 Median :1.552e+08
## Energy :18 Mean :2008 Mean :3.465e+08
## Interior :18 3rd Qu.:2013 3rd Qu.:3.209e+08
## NASA :18 Max. :2017 Max. :1.676e+09
## NSF :18
attach (climate)
names (climate)
## [1] "department" "year" "gcc_spending"
ggplot by the year(x) and gcc_spending (y) plotting by point:
ggplot(climate, aes(x = year, y =gcc_spending, color = department)) +
geom_point()
The gcc_spending from NASA department has displayed that high value compare between the other department, the data from 2000 to 2017 has fluctiative and the highest showed between 2000 to 2003
ggplot by the year(x) and gcc_spending (y) plotting by boxplot:
ggplot(climate, aes(x = year, y =gcc_spending, color = department)) +
geom_boxplot()
The figure of the box plot showed that any big diffrences of the data on the NASA Department on 2014. interestingly, that the small differences of the data by Interior Department on 2011.
ggplot by the year(x) and gcc_spending (y) plotting by line:
ggplot(climate, aes(x = year, y =gcc_spending, color = department)) +
geom_line()
Plotting the climate data sort by department and time series by year. the figure showed that the gcc_spending has fluctuative by the all department, but NASA Department showed the high value compare to the other Department.
df=climate
df$lngcc_spending = log(df$gcc_spending)
summary(df)
## department year gcc_spending lngcc_spending
## Agriculture :18 Min. :2000 Min. :3.113e+07 Min. :17.25
## All Other :18 1st Qu.:2004 1st Qu.:7.604e+07 1st Qu.:18.15
## Commerce (NOAA):18 Median :2008 Median :1.552e+08 Median :18.86
## Energy :18 Mean :2008 Mean :3.465e+08 Mean :19.02
## Interior :18 3rd Qu.:2013 3rd Qu.:3.209e+08 3rd Qu.:19.59
## NASA :18 Max. :2017 Max. :1.676e+09 Max. :21.24
## NSF :18
ggplot(climate, aes(x = year, y = gcc_spending, color = factor(year)))+
geom_point() + scale_x_log10() + geom_smooth(method = "lm") + facet_wrap(~department)
plotting by the linear and the year as factor, showed that the NASA Department has a high value and fluctuative from 2000 - 2017.
ggplot(climate, aes(x = year, y = gcc_spending, color = factor(year)))+
geom_point() + scale_x_log10() + geom_smooth(method = "glm") + facet_wrap (~department)
plotting by the GLM has showed that the similar result with liner model plotting.
glm_climate <- glm(year ~ gcc_spending + department, family = gaussian, data = climate)
summary (glm_climate)
##
## Call:
## glm(formula = year ~ gcc_spending + department, family = gaussian,
## data = climate)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -11.2060 -4.2741 0.4521 4.3017 9.1212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.007e+03 1.348e+00 1488.739 < 2e-16 ***
## gcc_spending 1.698e-08 6.219e-09 2.730 0.00731 **
## departmentAll Other 7.815e-02 1.733e+00 0.045 0.96412
## departmentCommerce (NOAA) -3.450e+00 2.145e+00 -1.608 0.11042
## departmentEnergy -1.596e+00 1.829e+00 -0.872 0.38473
## departmentInterior 7.265e-01 1.753e+00 0.414 0.67938
## departmentNASA -2.277e+01 8.519e+00 -2.673 0.00859 **
## departmentNSF -3.429e+00 2.141e+00 -1.602 0.11182
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 27.03431)
##
## Null deviance: 3391.5 on 125 degrees of freedom
## Residual deviance: 3190.0 on 118 degrees of freedom
## AIC: 782.74
##
## Number of Fisher Scoring iterations: 2
the result of the GLM analysis showed that the significant by the NASA Departmen and gcc_spending, the start meaning is the significantly by the year.
anova(glm_climate)
## Analysis of Deviance Table
##
## Model: gaussian, link: identity
##
## Response: year
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 125 3391.5
## gcc_spending 1 5.318 124 3386.2
## department 6 196.134 118 3190.0
the test anova showed that the Df value for the gcc_spending higher than Dpeartment by the time series data (year)
plot (glm_climate)
Generalized linear model for the climate data was displayed that the significantly to the NASA department by year. GLM analysis is describe that the factor influencing to the variable.
Read the CSV data from directory:
energy <- read.csv ("energy_spending.csv", header = TRUE)
summary(energy)
## department year
## Adv Sci Comp Res* : 22 Min. :1997
## Atomic Energy Defense : 22 1st Qu.:2002
## Basic Energy Sciences* : 22 Median :2008
## Bio and Env Research* : 22 Mean :2008
## Energy Efficiency and Renew Energy: 22 3rd Qu.:2013
## Fossil Energy : 22 Max. :2018
## (Other) :110
## energy_spending
## Min. :5.690e+07
## 1st Qu.:4.762e+08
## Median :6.948e+08
## Mean :1.456e+09
## 3rd Qu.:1.357e+09
## Max. :7.574e+09
##
attach (energy)
## The following objects are masked from climate:
##
## department, year
str (energy)
## 'data.frame': 242 obs. of 3 variables:
## $ department : Factor w/ 11 levels "Adv Sci Comp Res*",..: 11 1 3 4 7 8 10 5 9 6 ...
## $ year : int 1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 ...
## $ energy_spending: num 3.59e+09 2.17e+08 9.33e+08 5.51e+08 3.31e+08 ...
names (energy)
## [1] "department" "year" "energy_spending"
plotting the data by point ggplot:
ggplot(energy, aes(x = year, y =energy_spending, color = department)) +
geom_point()
The figure showed that the data for the energy spending has significantly increase from 1997 until 2018 by the Atomic Energy Defense Department. not only that Department, if we can see on the Adv. Sci Comp Res Department showed that increase as well, but honestly start from 2010 has decreased.
plotting the data by boxplot ggplot:
ggplot(energy, aes(x = year, y =energy_spending, color = department)) +
geom_boxplot()
The figure showed that any big differences on the Atomic Energy Defense Department in the 2000 and also any differences by the Adv Sci Comp Res Department in the 2016 compare to the other Department by time series data from 1997 - 2018.
Plotting the data by line ggplot:
ggplot(energy, aes(x = year, y =energy_spending, color = department)) +
geom_line()
The figure showed that the Atomic Energy Defense Department has fluctiative from 1997 until 2017 same as Atomic Energy Defense. but, both of it has high value of energy spending compare to the other Department. The data has recorded from 1997 until 2018.
ggplot(energy, aes(x = year, y = energy_spending, color = factor(year)))+
geom_point() + scale_x_log10() + geom_smooth(method = "lm") + facet_wrap(~department)
The figure showed that the Atomic Energy Defense Department has a high value of the energy spending over the year from 1997 - 2018. As same thet the Office of Science R&D Department has displayed that the value is high over the year start from 1997 until 2018
ggplot(energy, aes(x = year, y = energy_spending, color = factor(year)))+
geom_point() + scale_x_log10() + geom_smooth(method = "glm") + facet_wrap (~department)
The figure displayed that not any difference with the lm ggplot as displayed above, the value is high showed on the 2 Department (Atomic Energy Defense and Office of Science R&D Department)
glm_energy <- glm(year ~ energy_spending + department, family = gaussian, data = energy)
summary (glm_energy)
##
## Call:
## glm(formula = year ~ energy_spending + department, family = gaussian,
## data = energy)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -11.7463 -4.0891 -0.0892 4.2582 10.4473
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 2.004e+03 1.216e+00
## energy_spending 8.968e-09 9.140e-10
## departmentAtomic Energy Defense -4.227e+01 4.612e+00
## departmentBasic Energy Sciences* -1.015e+01 1.945e+00
## departmentBio and Env Research* -2.321e+00 1.664e+00
## departmentEnergy Efficiency and Renew Energy -6.038e+00 1.759e+00
## departmentFossil Energy -1.169e+00 1.652e+00
## departmentFusion Energy Sciences* -7.648e-02 1.647e+00
## departmentHigh-Energy Physics* -4.555e+00 1.712e+00
## departmentNuclear Energy -5.946e-01 1.649e+00
## departmentNuclear Physics* -1.384e+00 1.653e+00
## departmentOffice of Science R&D -3.749e+01 4.161e+00
## t value Pr(>|t|)
## (Intercept) 1648.525 < 2e-16 ***
## energy_spending 9.812 < 2e-16 ***
## departmentAtomic Energy Defense -9.165 < 2e-16 ***
## departmentBasic Energy Sciences* -5.220 4e-07 ***
## departmentBio and Env Research* -1.394 0.164521
## departmentEnergy Efficiency and Renew Energy -3.433 0.000707 ***
## departmentFossil Energy -0.708 0.479881
## departmentFusion Energy Sciences* -0.046 0.963015
## departmentHigh-Energy Physics* -2.661 0.008330 **
## departmentNuclear Energy -0.361 0.718681
## departmentNuclear Physics* -0.837 0.403358
## departmentOffice of Science R&D -9.011 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 29.85287)
##
## Null deviance: 9740.5 on 241 degrees of freedom
## Residual deviance: 6866.2 on 230 degrees of freedom
## AIC: 1522.4
##
## Number of Fisher Scoring iterations: 2
The analysis of GLM showed that the significantly factor to the variable by the Department are Energy Defense Department, Basic Energy Science Department, Energy Efficiency and Renew Energy Department, Office of Science R&D Department and High-Energy Physics Department. The Stars meaning that any differences factor to the variable of the year.
anova(glm_energy)
## Analysis of Deviance Table
##
## Model: gaussian, link: identity
##
## Response: year
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 241 9740.5
## energy_spending 1 152.03 240 9588.5
## department 10 2722.31 230 6866.2
The Anova test showed that the Df value is high to the Energy Spending than Department, also the Residual Deviasi for 2 factor that the energy spending has higher than Department value.
plot (glm_energy)
Generalized Linear Model for analysis is to estimate the factor influence by the time series (year) from 1997 - 2018.
rd <- read.csv ("fed_r_d_spending.csv", header = TRUE)
library(ggplot2)
summary (rd)
## department year rd_budget total_outlays
## DHS : 42 Min. :1976 Min. :0.000e+00 Min. :3.718e+11
## DOC : 42 1st Qu.:1986 1st Qu.:9.020e+08 1st Qu.:9.904e+11
## DOD : 42 Median :1996 Median :1.888e+09 Median :1.581e+12
## DOE : 42 Mean :1996 Mean :1.035e+10 Mean :1.880e+12
## DOT : 42 3rd Qu.:2007 3rd Qu.:1.206e+10 3rd Qu.:2.729e+12
## EPA : 42 Max. :2017 Max. :9.432e+10 Max. :3.982e+12
## (Other):336
## discretionary_outlays gdp
## Min. :1.756e+11 Min. :1.790e+12
## 1st Qu.:4.385e+11 1st Qu.:4.536e+12
## Median :5.460e+11 Median :8.230e+12
## Mean :6.942e+11 Mean :9.175e+12
## 3rd Qu.:1.042e+12 3rd Qu.:1.432e+13
## Max. :1.347e+12 Max. :1.918e+13
##
attach (rd)
## The following objects are masked from energy:
##
## department, year
## The following objects are masked from climate:
##
## department, year
str (rd)
## 'data.frame': 588 obs. of 6 variables:
## $ department : Factor w/ 14 levels "DHS","DOC","DOD",..: 3 9 4 7 10 11 13 8 5 6 ...
## $ year : int 1976 1976 1976 1976 1976 1976 1976 1976 1976 1976 ...
## $ rd_budget : num 3.57e+10 1.25e+10 1.09e+10 9.23e+09 8.02e+09 ...
## $ total_outlays : num 3.72e+11 3.72e+11 3.72e+11 3.72e+11 3.72e+11 ...
## $ discretionary_outlays: num 1.76e+11 1.76e+11 1.76e+11 1.76e+11 1.76e+11 ...
## $ gdp : num 1.79e+12 1.79e+12 1.79e+12 1.79e+12 1.79e+12 ...
names (rd)
## [1] "department" "year" "rd_budget"
## [4] "total_outlays" "discretionary_outlays" "gdp"
typeof (rd$rd_budget)
## [1] "double"
typeof (rd$total_outlays)
## [1] "double"
typeof (rd$discretionary_outlays)
## [1] "double"
typeof (rd$gdp)
## [1] "double"
Plotting the data rd (x = year, y = rd_budget):
ggplot(rd, aes( x = year, y = rd_budget, color = total_outlays)) +
geom_point()
The figure displayed that the rd_budget over time has increased from 1997 to 2018 by based on the total outylays inform that the total outlays has incrreasing as well over the time.
Plotting the data rd (x = year, y = rd_budget, based on the gdp):
ggplot(rd, aes( x = year, y = rd_budget, color = gdp)) +
geom_point()
The figure dispyaed that the rd_budget has increased over time, as well as gdp has increased over time.
Plottting the data (x = year, y = rd_budget, based on the discreet outlays):
ggplot(rd, aes( x = year, y = rd_budget, color = discretionary_outlays)) +
geom_point()
The figure showed that the rd_budget has increased over time based on the discretionary_outlays.
Plotting the data (x = year, y = rd_budget based on the Department):
ggplot(rd, aes( x = year, y = rd_budget, color = department)) +
geom_point()
The figure displayed that based on the Department (DOD) has fluctuative over the time (1997 - 2018) and has a high value of the budget compare to the other Departement.
Plotting the data by linear model use the ggplot year and gdp:
ggplot(rd, aes(x = year, y =gdp)) +
geom_point() + geom_smooth(method="lm") + facet_wrap (~department)
The figure displayed that based on the linear model showed the gdp has increase over time time based on the all Department.
Plotting the data by linear model use the ggplot year and budget:
ggplot(rd, aes(x = year, y =rd_budget)) +
geom_point() + geom_smooth(method="lm") + facet_wrap (~department)
The figure showed that DOD department has fluctuative distribution over the time, we can see that the increase of the budget has high significant increase over the time by linear model, also we can see the high slope. NHS and NIH department are showed that increase significant compare than the other department, but the DOD Department has a high significantly increase over time.
Plotting the data by linear model use the ggplot year and total outlays:
ggplot(rd, aes(x = year, y =total_outlays)) +
geom_point() + geom_smooth(method="lm") + facet_wrap (~department)
The figure displayed that based on the total outlays over the time has increase, and the data fit to the line by linear model, it means that the data over time has significantly increase on the all Department over the time.
Plotting the data by linear model use the ggplot year and discretionary outlays:
ggplot(rd, aes(x = year, y =discretionary_outlays)) +
geom_point() + geom_smooth(method="lm") + facet_wrap (~department)
The figure showed that over the time based on the all Department by the discretionary outlays has increased and the linear model has fitted to the line, it means that the data has increased significantly.
Plotting the data by glm on the ggplot over the time:
ggplot(rd, aes(x = year, y = total_outlays, color = gdp))+
geom_point() + scale_x_log10() + geom_smooth(method = "glm")
The figure dispyaed that the total overlays data over the time has increased. The model has fitted to the data, it means that the data has increased significantly over the time.
Plotting the data by glm on the ggplot over the time:
ggplot(rd, aes(x = year, y = total_outlays, color = rd_budget))+
geom_point() + scale_x_log10() + geom_smooth(method = "glm")
The figure showed that the rd budget has a low over time based on the color, but the total outlays has increased over the time. The data fitted to the line by glm, it means that the data has increased significantly over the time.
The GLM analysis is to assess what is the factor influence by the total outlays: m
glm_rd <- glm(total_outlays ~ year + rd_budget + department + gdp + discretionary_outlays, family = gaussian, data = rd)
summary (glm_rd)
##
## Call:
## glm(formula = total_outlays ~ year + rd_budget + department +
## gdp + discretionary_outlays, family = gaussian, data = rd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.647e+11 -2.075e+10 1.154e+10 4.934e+10 3.322e+11
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.142e+13 5.447e+12 5.769 1.31e-08 ***
## year -1.599e+10 2.758e+09 -5.800 1.10e-08 ***
## rd_budget -2.598e+00 7.958e-01 -3.265 0.00116 **
## departmentDOC 2.215e+09 2.324e+10 0.095 0.92411
## departmentDOD 1.671e+11 5.620e+10 2.973 0.00308 **
## departmentDOE 2.989e+10 2.497e+10 1.197 0.23182
## departmentDOT 1.400e+09 2.324e+10 0.060 0.95198
## departmentEPA 9.650e+08 2.323e+10 0.042 0.96688
## departmentHHS 5.694e+10 2.905e+10 1.960 0.05047 .
## departmentInterior 1.355e+09 2.323e+10 0.058 0.95351
## departmentNASA 3.056e+10 2.505e+10 1.220 0.22297
## departmentNIH 5.388e+10 2.850e+10 1.891 0.05917 .
## departmentNSF 9.508e+09 2.341e+10 0.406 0.68482
## departmentOther 2.899e+09 2.325e+10 0.125 0.90082
## departmentUSDA 5.201e+09 2.329e+10 0.223 0.82335
## departmentVA 9.220e+08 2.323e+10 0.040 0.96836
## gdp 1.489e-01 7.184e-03 20.722 < 2e-16 ***
## discretionary_outlays 1.472e+00 4.689e-02 31.396 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.133344e+22)
##
## Null deviance: 7.1508e+26 on 587 degrees of freedom
## Residual deviance: 6.4601e+24 on 570 degrees of freedom
## AIC: 31548
##
## Number of Fisher Scoring iterations: 2
The result showed that the factor influences significantly are year, rd budget, department DOC, gdp and discreetionary outlays. The start meaning that has significantly influences to the variable.
plotting the GLM result :
plot(glm_rd)
anova(glm_rd)
## Analysis of Deviance Table
##
## Model: gaussian, link: identity
##
## Response: total_outlays
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 587 7.1508e+26
## year 1 6.7994e+26 586 3.5138e+25
## rd_budget 1 2.9410e+20 585 3.5138e+25
## department 13 2.6820e+21 572 3.5136e+25
## gdp 1 1.7504e+25 571 1.7631e+25
## discretionary_outlays 1 1.1171e+25 570 6.4601e+24
The analysis of anova showed that the year has a high value for the Df and the residuls deviance showed that gdp has low residuals. The high residuals Deviance value is discretionary outlays.